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The Developjnent o« a Sequentially Scaled Aohlevemcnt Test^ 



Richard C. Cox and Claim T« Graham 
Univoreity of Pittsburgh 

A program of individualised instruction demands a re- 
examination of traditional testing procedures, in the typical 
learning situation instructional mcsterials and rate are held 
constant y and achievement testing at the end of some specified 
unit of work is designed to rank students according to varying 
levels of achievement. Individualised instruction, on the other 
hand, allows each individual student to set his own le^^^ning 
pace I yet, performance criteria for successful completion of some 
specified unit of viork are identical for all students {Coulson 
and Cogswell, 196fi) * 

Items for achievement testing in the latter situation should 
be designed to indicate whether or not the required behaviors 
have been mastered; not. to discriminate among individuals. Stu- 
dents must be compared to an absolute standard as opposed to a 
normative standard, the student's score reflecting the degree of 
his performance with that of other individuals. This distinction 
between norm and criterion-referenced measures has been made by 
Glaser (1963) . 
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A content--referoncad measure provides considerable informa* 
tion for making decisions concerning student advancement. For 
example, a score of 80 per cent indicates that a student has 
successfully mastered 80 per cent of the specified behaviors. 
However, unless the test is designed to measure performance on 
only one behavior, the total test score does not indicate which 
behaviors the student has or has not mastered. In order to ob- 
tain this information, student performance on each item must be 
examined . 

One solution to this rather laborious task would be a test 
in which the total score would indicate the response pattern of 
the individual. Such tests have been employed in the investiga- 
tion of attitudes, and have been analyzed using the Guttman (1944) 
"Scalogr«am Analysis." Essentially, the analysis includes the 
ranking of scores from highest to lowest, and the ranking of items 
fr^ most f alterable to least favorable. Theoretically, those stu- 
dents with the highest scores (highest being most favorable) would 
have answered only the most favorable items i those scoring low 
would have answered only the least favorable items, etc. The 
analysis yields a coefficient of reproducibility which indicates 
how well an individual's response pattern can be reproduced knowing 
his total score. The value of .90 was arbitrarily established as 
an acceptable lower limit. 



Applying this technique to achievement testing would yield 
valuable information. If the behaviors to be tested could be ar- 
ranged in a sequential order, and the test were scalable, a stu- 
dent who obtained a score of 5 would have answered items 1, 2, 3, 

4, and 3 and no more. A student could not score 1 unless he an- 
swered 1 through 7 and did not answer any items beyond 7. Knowing 
the beliiaviors these items represent, the score on the test indicates 
to the teacher, guidance counselor, or researcher those behaviors 
the student has mastered and those behaviors he has yet to master. 
The present study is an attempt to develop such a test. 

Procedure and Results 

The first step in the development of vsny test is the identi- 
fication of objectives to be tested. In a test designed to be 
scalable, the objectives must be arranged sequentially. In this 
study the terminal objective to be tested was the student's ability 
to add 2 two-digit numerals involving carrying. Using this as a 
starting point, the question was asked, "Nhat skills must have 
been mastered previously in order to master this objective?* With 
this question as a guide, the list of fifteen objectives presented 
in Figure I was developed. 
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With the exception of one task for objective 7b, there were 
from 2 to 5 tasks constructed for each of the objectives • The 
total number of tasks for the entire test was 50. The tasks per- 
taining to each objective were combined to form one "item" for 
each objective. This procedure is similar to the "H-technique" 
suggested by Stouffer, Borgatta, Hays, and Henry (1953) . As an 
example, consider the three tasks: 

20 36 54 

±11 ±42 ±33 

These tasks would compose one "item" testing objective 8. 

This procedure was followed, and a 15 item test was con- 
structed. The test was administered to a kindergarten, first, and 
second grade in order to obtain a wide range of 2 Ut>ility levels. 

The possible total score range was. from 0 to 15. Students were 
ranked according to this total score and the response pattern was 
plotted. This pattern indicated that scxae of the items were not 
in the correct position to obtain the maximum coefficient of re- 
producibility, i.e., the postulated sequence of objectives was 
not empirically verified. The items were rearranged in order to 
yield the maximum reproducibility coefficient.. The response pat- 
tern obtained after the items had been rearranged yielded a re- 
producibility coefficient of .961. 

As a arther revision, objectives 3, 7a, and 7b and their 
corresponding items were omitted— objective 3 because it was de- 
pendent on a specific curriculum, and objectives 7a and 7b because 
of ambiguous directions. The final arrangement of items yielded 
a reproducibility coefficient of .977. 
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According to Guttman in Stouffer (1950) , the coefficient of 
reproducibility is a necessary but not a sufficient criterion for 
scalability. Since the reproducibility coefficient ior a given 
item cannot be less than the proportion of responses occurring in 
the most frequently chosen category, Guttman suggests that too many 
extreme items, 80 per cent or greater in any one category can 
spuriously raise the reproducibility coefficient. Herbert Menzel 
(1953) suggests a procedure^ which, when taken in conjunction with 
the reproducibility coefficient, further contributes evidence of 
scalability. Menzel suggests a coefficient of scalability which 
determines the degree to which the individual's performance can be 
reproduced from knowledge of the marginal totals. The coefficient 
prevents one from spuriously attributing high scalability from a 
sample composed of many extreme items and/or individuals. A co- 
efficient of .65 or better is established as a criterion. The 
scalability coefficient for the revised test was .902. 

Although the revised test met the criteria for scalability, 
it had never been administered in its present form. As a valida- 
tion study, the final revision was administered to a differcmt 
kindergarten, first, and second grade. This new response pattern 
yielded a reproducibility coefficient of .970 and a scalability 
coefficient of .792. 



Discussion 

The results indicate that it is indeed possible to develop a 
sequentially scaled achievement test. However, these results must 
be tempered by the fact that the test is based upon a restricted 
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area of subject matter. At the present time additional tests are 
being developed in other areas of mathematics covering a wider 



range of objectives. These areas include subtraccion, addition, 
time telling, numeration, and money. The reproducibility coeffi- 
cients obtained range from .8S to .96 oh the initial test adminis- 
tration. further replications with more complex^ skills and with 
larger and more heterogeneous samples would be desirable • 

The results of the study also should be i tempered with the 

\ 

realization that the item responses may be. a function of prior 



dents have been exposed. This is not to say, however, that it is. 



lating the content taught in the classroom one could \ dictate a 



One obvious result of the study is that the logical ordering 
of objectives is not sufficient for the establishment '^of a scalable 
test. Empirical evidence must be obtained to verify fjr refute the 
postulated order • 1 



educational experiences. In school or elsewhere, to which the stu- 




series of objectives which would yield an empirically scaled test 
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